-
Notifications
You must be signed in to change notification settings - Fork 666
Consolidate MapAccess
, and Subscript
into CompoundExpr
to handle the complex field access chain
#1551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consolidate MapAccess
, and Subscript
into CompoundExpr
to handle the complex field access chain
#1551
Conversation
CompoundExpr
to handle the complex field access chain
src/parser/mod.rs
Outdated
} else { | ||
let mut expr = self.parse_function(ObjectName(id_parts))?; | ||
// consume all period if it's a method chain | ||
if self.dialect.supports_methods() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try_parse_method
already does the if self.dialect.supports_methods()
check so that we should be able to skip the if condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, if this condition was removed, the test for method parsing will fail:
parse_method_expr
parse_method_select
Because the following parse_compound_expr
will try to consume all the dots for an expression, we need to parse the method here to avoid consume (
in parse_compound_expr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, to clarify what I meant was that try_parse_method
does this already
if !self.dialect.supports_methods() {
return Ok(expr);
}
So that this can be simplified as following (i.e without the extra if self.dialect.supports_methods()
)
expr = self.try_parse_method(expr)?;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. It's more simple. Thanks!
CompoundExpr
to handle the complex field access chain MapAccess
, and Subscript
into CompoundExpr
to handle the complex field access chain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @goldmedal and @iffyio
I didn't study the parser changes closely, but I reviewed the Expr
representation and left some comments.
In terms of the potential downstream impacts (and avoiding regressions), one thing that I suggest to help evaluate the potential downstream impact is of this trying it out in DataFusion and see if it works / passes all the tests;
You could potentially use apache/datafusion#13546 as a base to test with
Thanks, @alamb and @iffyio for reviewing.
Sounds good 👍 I'll test this PR based on it tomorrow. |
Created apache/datafusion#13734 for testing. The good news is that It only impacts a few parts of the planner and the unparser 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @goldmedal -- this looks good to me. 🙏
Given the potential for non trivial downstream conflicts due to this change (look at the list of conflicts it has already collected) I would like to merge it in right after we have done the next release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @goldmedal! Left some minor comments, the changes look good to me!
src/parser/mod.rs
Outdated
|
||
pub fn parse_map_access(&mut self, expr: Expr) -> Result<Expr, ParserError> { | ||
let key = self.parse_expr()?; | ||
let result = match key { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I was initially wondering about this match since it looked incorrect to restrict the key expression type. e.g. this BigQuery test case. But then I realised that test case is passing because it now takes a different codepath via parse_compound_field_access
. And so I wonder if there are any scenarios that rely on this method anymore, if there aren't it seems like we might be able to remove it entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. We can remove it entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed by 5c54d1b
src/parser/mod.rs
Outdated
} else { | ||
let mut expr = self.parse_function(ObjectName(id_parts))?; | ||
// consume all period if it's a method chain | ||
if self.dialect.supports_methods() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, to clarify what I meant was that try_parse_method
does this already
if !self.dialect.supports_methods() {
return Ok(expr);
}
So that this can be simplified as following (i.e without the extra if self.dialect.supports_methods()
)
expr = self.try_parse_method(expr)?;
src/parser/mod.rs
Outdated
} | ||
} | ||
|
||
fn exprs_to_idents(root: &Expr, fields: &[AccessExpr]) -> Option<Vec<Ident>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a description of what the function does?
src/parser/mod.rs
Outdated
} | ||
} | ||
|
||
fn exprs_to_idents(root: &Expr, fields: &[AccessExpr]) -> Option<Vec<Ident>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the caller wouldn't need the copy of root
and fields
it would be nice to avoid the cloning? maybe something like this could work?
fn exprs_to_idens(root: Expr, fields: Vec<AccessExprs>) -> Result<Vec<Ident>, (Expr, Vec<AccessExprs>)>
so that if no conversion was made the caller gets back the same values they provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. I think we can save the cloning by separating it into two functions. One for check and another for transfer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed by 57830e2
src/parser/mod.rs
Outdated
if dialect_of!(self is SnowflakeDialect | MsSqlDialect) | ||
&& self.consume_tokens(&[Token::Plus, Token::RParen]) | ||
{ | ||
Ok(Expr::OuterJoin(Box::new( | ||
match <[Ident; 1]>::try_from(id_parts) { | ||
Ok([ident]) => Expr::Identifier(ident), | ||
Err(parts) => Expr::CompoundIdentifier(parts), | ||
}, | ||
))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice if we could reuse this logic somehow in a function? since its not trivial and lists dialects, now that its duplicated it could be easy to miss if changes are made to one but not the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed by 4b3818c
src/parser/mod.rs
Outdated
}) | ||
} | ||
|
||
/// Parse an multi-dimension array accessing like `[1:3][1][1]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Parse an multi-dimension array accessing like `[1:3][1][1]` | |
/// Parses a multi-dimension array accessing like `[1:3][1][1]` |
src/parser/mod.rs
Outdated
|
||
/// Parse an multi-dimension array accessing like `[1:3][1][1]` | ||
/// | ||
/// Parser is right after the first `[` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given this is a pub function it would be nice if the api was symmetric so that the function also consumed the leading [
? I think within the parser itself that would mean calling prev_token()
or replacing with peek()
before invoking this function which would be a fair price to pay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed by 67cd877
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @goldmedal!
@alamb @iffyio, |
Thanks for working on this @goldmedal! |
Description
close #1533
This PR introduces a new Expr,
Expr::CompoundExpr
, which handles the complex field access chain likea.b[1].c
orfunc()[1]
. As the previous discussion #1541 (comment), I found we have many similar expressions for the similar purpose (e.g.MapAccess
,Subscript
,JsonAcess
, ...).CompoundExpr
tries to represent a structure for all of them. As the suggestion from @iffyio (#1541 (comment)),CompoundExpr
is a linear structure, it has a root expression and a list of access fields, instead of a nested representation.To avoid too many breaking changes, this PR only replaces
Expr::MapAccess
andExpr::Subscript
withExpr::CompoundExpr
. Potentially, it can represent more things.Array
A 1-dim array
a[1]
will be represented likeA 2-dim array
a[1][2]
will be represented likeMap or Struct (Bracket-style)
A map
a['fied1']
will be represented likeA 2-dim map
a['field1']['field2']
will be represented likeStruct (Period-style) (only effect when the chain contains both
subscript
andexpr
)A struct access
a[field1].field2
will be representedIf a struct access likes
a.field1.field2
, it will be represented byCompoundIdentifer([a, field1, field2])
Breaking Change
Expr::Subscript
andExpr::MapAccess
are removed.Additional context